---
title: "Example: Agent-Driven Metadata Filtering | RAG"
description: Example of using a Mastra agent in a RAG system to construct and apply metadata filters for document retrieval.
---

import GithubLink from "@site/src/components/GithubLink";

# Agent-Driven Metadata Filtering

This example demonstrates how to implement a Retrieval-Augmented Generation (RAG) system using Mastra, OpenAI embeddings, and PGVector for vector storage.
This system uses an agent to construct metadata filters from a user's query to search for relevant chunks in the vector store, reducing the amount of results returned.

## Overview

The system implements metadata filtering using Mastra and OpenAI. Here's what it does:

1. Sets up a Mastra agent with gpt-5.1 to understand queries and identify filter requirements
2. Creates a vector query tool to handle metadata filtering and semantic search
3. Processes documents into chunks with metadata and embeddings
4. Stores both vectors and metadata in PGVector for efficient retrieval
5. Processes queries by combining metadata filters with semantic search

When a user asks a question:

- The agent analyzes the query to understand the intent
- Constructs appropriate metadata filters (e.g., by topic, date, category)
- Uses the vector query tool to find the most relevant information
- Generates a contextual response based on the filtered results

## Setup

### Environment Setup

Make sure to set up your environment variables:

```bash title=".env"
OPENAI_API_KEY=your_openai_api_key_here
POSTGRES_CONNECTION_STRING=your_connection_string_here
```

### Dependencies

Then, import the necessary dependencies:

```typescript copy showLineNumbers title="index.ts"
import { Mastra } from "@mastra/core";
import { Agent } from "@mastra/core/agent";
import { ModelRouterEmbeddingModel } from "@mastra/core/llm";
import { PgVector, PGVECTOR_PROMPT } from "@mastra/pg";
import { createVectorQueryTool, MDocument } from "@mastra/rag";
import { embedMany } from "ai";
```

## Vector Query Tool Creation

Using createVectorQueryTool imported from @mastra/rag, you can create a tool that enables metadata filtering. Each vector store has its own prompt that defines the supported filter operators and syntax:

```typescript copy showLineNumbers{9} title="index.ts"
const vectorQueryTool = createVectorQueryTool({
  id: "vectorQueryTool",
  vectorStoreName: "pgVector",
  indexName: "embeddings",
  model: new ModelRouterEmbeddingModel("openai/text-embedding-3-small"),
  enableFilter: true,
});
```

Each prompt includes:

- Supported operators (comparison, array, logical, element)
- Example usage for each operator
- Store-specific restrictions and rules
- Complex query examples

## Document Processing

Create a document and process it into chunks with metadata:

```typescript copy showLineNumbers{17} title="index.ts"
const doc = MDocument.fromText(
  `The Impact of Climate Change on Global Agriculture...`,
);

const chunks = await doc.chunk({
  strategy: "recursive",
  size: 512,
  overlap: 50,
  separator: "\n",
  extract: {
    keywords: true, // Extracts keywords from each chunk
  },
});
```

### Transform Chunks into Metadata

Transform chunks into metadata that can be filtered:

```typescript copy showLineNumbers{31} title="index.ts"
const chunkMetadata = chunks?.map((chunk: any, index: number) => ({
  text: chunk.text,
  ...chunk.metadata,
  nested: {
    keywords: chunk.metadata.excerptKeywords
      .replace("KEYWORDS:", "")
      .split(",")
      .map((k) => k.trim()),
    id: index,
  },
}));
```

## Agent Configuration

The agent is configured to understand user queries and translate them into appropriate metadata filters.

The agent requires both the vector query tool and a system prompt containing:

- Metadata structure for available filter fields
- Vector store prompt for filter operations and syntax

```typescript copy showLineNumbers{43} title="index.ts"
export const ragAgent = new Agent({
  id: "rag-agent",
  name: "RAG Agent",
  model: "openai/gpt-5.1",
  instructions: `
  You are a helpful assistant that answers questions based on the provided context. Keep your answers concise and relevant.

  Filter the context by searching the metadata.

  The metadata is structured as follows:

  {
    text: string,
    excerptKeywords: string,
    nested: {
      keywords: string[],
      id: number,
    },
  }

  ${PGVECTOR_PROMPT}

  Important: When asked to answer a question, please base your answer only on the context provided in the tool.
  If the context doesn't contain enough information to fully answer the question, please state that explicitly.
  `,
  tools: { vectorQueryTool },
});
```

The agent's instructions are designed to:

- Process user queries to identify filter requirements
- Use the metadata structure to find relevant information
- Apply appropriate filters through the vectorQueryTool and the provided vector store prompt
- Generate responses based on the filtered context

> Note: Different vector stores have specific prompts available. See [Vector Store Prompts](/docs/v1/rag/retrieval#vector-store-prompts) for details.

## Instantiate PgVector and Mastra

Instantiate PgVector and Mastra with the components:

```typescript copy showLineNumbers{69} title="index.ts"
const pgVector = new PgVector({
  id: 'rag-vectors',
  connectionString: process.env.POSTGRES_CONNECTION_STRING!,
});

export const mastra = new Mastra({
  agents: { ragAgent },
  vectors: { pgVector },
});
const agent = mastra.getAgent("ragAgent");
```

## Creating and Storing Embeddings

Generate embeddings and store them with metadata:

```typescript copy showLineNumbers{78} title="index.ts"
const { embeddings } = await embedMany({
  model: new ModelRouterEmbeddingModel("openai/text-embedding-3-small"),
  values: chunks.map((chunk) => chunk.text),
});

const vectorStore = mastra.getVector("pgVector");
await vectorStore.createIndex({
  indexName: "embeddings",
  dimension: 1536,
});

// Store both embeddings and metadata together
await vectorStore.upsert({
  indexName: "embeddings",
  vectors: embeddings,
  metadata: chunkMetadata,
});
```

The `upsert` operation stores both the vector embeddings and their associated metadata, enabling combined semantic search and metadata filtering capabilities.

## Metadata-Based Querying

Try different queries using metadata filters:

```typescript copy showLineNumbers{96} title="index.ts"
const queryOne = "What are the adaptation strategies mentioned?";
const answerOne = await agent.generate(queryOne);
console.log("\nQuery:", queryOne);
console.log("Response:", answerOne.text);

const queryTwo =
  'Show me recent sections. Check the "nested.id" field and return values that are greater than 2.';
const answerTwo = await agent.generate(queryTwo);
console.log("\nQuery:", queryTwo);
console.log("Response:", answerTwo.text);

const queryThree =
  'Search the "text" field using regex operator to find sections containing "temperature".';
const answerThree = await agent.generate(queryThree);
console.log("\nQuery:", queryThree);
console.log("Response:", answerThree.text);
```

<br />
<br />
<hr className="dark:border-[#404040] border-gray-300" />
<br />
<br />
<GithubLink
  href={
    "https://github.com/mastra-ai/mastra/blob/main/examples/basics/rag/filter-rag"
  }
/>
